Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
نویسنده
چکیده
Recently, there has been a rebirth of empiricism in the field of natural language processing. Manual encoding of linguistic information is being challenged by automated corpus-based learning as a method of providing a natural language processing system with linguistic knowledge. Although corpus-based approaches have been successful in many different areas of natural language processing, it is often the case that these methods capture the linguistic information they are modelling indirectly in large opaque tables of statistics. This can make it difficult to analyze, understand and improve the ability of these approaches to model underlying linguistic behavior. In this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learning method applied to part-of-speech tagging.
منابع مشابه
Towards Learning Error-Driven Transformations for Information Extraction
Transformation-based and error-driven induction of rules has proven to be a successful and applicable technique for Part-of-Speech tagging and other natural language processing tasks. This paper presents the ongoing work to create a universal algorithm for learning transformations in the domain of information extraction and semantic annotation. Such a method can be applied to different and usef...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملFormalization of Transformation-Based Learning
Research in automatic Part of Speech (POS) tagging has been dominated by Markov Model (MM) taggers. Brill [1, 3, 6], has recently described a transformation-based system with comparable accuracy, and simpler algorithms and representation than MM taggers. We present a set-based formal model of natural language ambiguity and semantic tagging that forms a basis for the generalisation of the transf...
متن کاملMultidimensional transformation-based learning
This paper presents a novel method that allows a machine learning algorithm following the transformation-based learning paradigm \cite{brill95:tagging} to be applied to multiple classification tasks by training jointly and simultaneously on all fields. The motivation for constructing such a system stems from the observation that many tasks in natural language processing are naturally composed o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 21 شماره
صفحات -
تاریخ انتشار 1995